Source: https://www.expertafrica.com/wildlife/spotted-hyena/africa
Today, we will look at the distribution and environmental suitability of Spotted Hyenas. The occurrence is downloaded from GBIF. The environmental variables are bioclimatic variables:
| code | name |
|---|---|
| BIO1 | Annual Mean Temperature |
| BIO2 | Mean Diurnal Range (Mean of monthly (max temp - min temp)) |
| BIO3 | Isothermality (BIO2/BIO7) (×100) |
| BIO4 | Temperature Seasonality (standard deviation ×100) |
| BIO5 | Max Temperature of Warmest Month |
| BIO6 | Min Temperature of Coldest Month |
| BIO7 | Temperature Annual Range (BIO5-BIO6) |
| BIO8 | Mean Temperature of Wettest Quarter |
| BIO9 | Mean Temperature of Driest Quarter |
| BIO10 | Mean Temperature of Warmest Quarter |
| BIO11 | Mean Temperature of Coldest Quarter |
| BIO12 | Annual Precipitation |
| BIO13 | Precipitation of Wettest Month |
| BIO14 | Precipitation of Driest Month |
| BIO15 | Precipitation Seasonality (Coefficient of Variation) |
| BIO16 | Precipitation of Wettest Quarter |
| BIO17 | Precipitation of Driest Quarter |
| BIO18 | Precipitation of Warmest Quarter |
| BIO19 | Precipitation of Coldest Quarter |
The habitat of Spotted Hyenas is mainly sub-Saharan Africa.
Because we didn’t learn spatial analysis yet, we will focus on non-spatial analysis today:
Tasks:
read.csv to read occurrence table and assign to object occurrence.country, use function as_country_name in package passport to get country names based on countryCode.read_csv to read variables table and assign to object variables.bio**.occ_var.Hint: we could directly use URL as the path to read tables.
library(readr)
library(dplyr)
library(passport)
# Read occurrence
occurrence <- read.csv('https://agroimpacts.github.io/geospaar/hyenas_occurrence.csv',
stringsAsFactors = F) %>%
dplyr::select(gbifID, species, countryCode, decimalLatitude,
decimalLongitude, month, year) %>%
mutate(country = as_country_name(countryCode))
# Read variables
variables <- read_csv('https://agroimpacts.github.io/geospaar/hyenas_variables.csv') %>%
dplyr::select(gbifID,
wc2.1_2.5m_bio_1, wc2.1_2.5m_bio_4,
wc2.1_2.5m_bio_5, wc2.1_2.5m_bio_6,
wc2.1_2.5m_bio_12, wc2.1_2.5m_bio_13,
wc2.1_2.5m_bio_14, wc2.1_2.5m_bio_15)
names(variables) <- c('gbifID',
paste0('bio', c(1, 4:6, 12:15)))
# Join two tables, remove NAs and duplicates
occ_var <- inner_join(occurrence, variables) %>%
na.omit() %>% distinct()
head(occ_var)
## gbifID species countryCode decimalLatitude decimalLongitude month
## 1 3044961479 Crocuta crocuta TZ -2.333333 34.83333 12
## 2 3044944191 Crocuta crocuta ZA -28.037928 32.16869 10
## 3 3044939388 Crocuta crocuta KE -0.345021 36.81288 2
## 4 3044813070 Crocuta crocuta KE -2.637597 37.24065 9
## 5 3044804214 Crocuta crocuta ZA -24.116664 31.63650 9
## 6 3044803667 Crocuta crocuta ZA -24.605932 31.73148 12
## year country bio1 bio4 bio5 bio6 bio12 bio13 bio14 bio15
## 1 2007 Tanzania 20.98183 84.84062 28.240 12.684 827 132 19 54.41696
## 2 2007 South Africa 21.26133 259.94644 29.528 11.228 941 141 29 47.39932
## 3 2021 Kenya 14.25450 76.70544 22.020 6.976 1260 227 53 54.45908
## 4 2017 Kenya 21.76983 146.04869 31.040 13.460 634 138 3 83.16368
## 5 2007 South Africa 22.77200 318.79556 32.340 9.424 511 94 5 80.18640
## 6 2020 South Africa 21.68317 311.25281 30.928 8.560 601 106 8 76.95921
First, let’s have a look at the potential bias of observations. One hypothesis:
Tasks:
month by filling the ___.library(ggplot2)
ggplot(occ_var) +
geom_histogram(aes(x = as.factor(month)),
stat = 'count',
color = 'transparent',
fill = 'blue') +
labs(x = 'Month', y = 'Count') +
theme_bw()
As we could see from the figure that there are more observations during July to November. This might because the weather is nicer for people to come out to meet Hyenas. Or it might because Hyenas have more activities during these months. So if we want to fit a model or so, it might be good to resample the observations to make sure they distribute evenly across each month.
There might be some similar bias of observation spatially, e.g. distance to roads. We will look at in unit2.
Tasks:
year from scratch.occ_var %>% filter(year >= 2000) %>%
ggplot() +
geom_histogram(aes(x = as.factor(year)),
stat = 'count',
color = 'transparent',
fill = 'red') +
labs(x = 'Year', y = 'Count') +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
2020 is very interesting that might relate to the outbreak of COVID.
Distribution by countries
Tasks:
___ to make the code work.occ_var %>%
ggplot(aes(forcats::fct_infreq(country))) +
geom_bar(aes(fill = as.factor(month)),
color = 'transparent') +
labs(x = 'Country', y = 'Count') +
scale_fill_brewer(palette = "Paired", name = 'Month') +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Randomly select 5 years to plot
Since we don’t learn spatial analysis yet. So now we just plot coordinates in Cartesian coordinate system like a regular plot.
Task:
occ_var with year >= 2000.sample function and %in% operator.set.seed(10)
occ_var %>%
filter(year >= 2000) %>%
filter(year %in% sample(unique(year), 6)) %>%
ggplot() +
geom_point(aes(x = decimalLongitude,
y = decimalLatitude,
col = as.factor(year)),
size = 2) +
scale_color_brewer(palette = "Dark2", name = 'Year') +
labs(x = 'Longitude', y = 'Latitude') +
theme_bw()
Make animations by year
Now let’s learn how to make an animated figure.
library(magick)
library(gganimate)
p1 <- occ_var %>%
filter(year >= 2000) %>%
mutate(year = as.integer(year)) %>%
ggplot() +
geom_point(aes(x = decimalLongitude,
y = decimalLatitude),
size = 2, color = 'red') +
scale_color_brewer(palette = "Dark2", name = 'Year') +
labs(x = 'Longitude', y = 'Latitude', title = 'Year: {frame_time}') +
transition_time(year) +
theme_bw()
animate(p1, renderer = magick_renderer(),
fps = 5, height = 3.5, width = 3.5,
units = "in", res = 150)
Tasks:
bio columns.Variable of variable names and another column Mean of mean values.kableExtra to show the table.library(kableExtra)
occ_var %>%
dplyr::select(contains('bio')) %>%
apply(2, mean) %>% data.frame(Mean = .) %>%
mutate(Variable = rownames(.)) %>%
dplyr::select(Variable, Mean) %>%
kbl(row.names = F, caption = 'Variable mean') %>%
kable_paper("striped",
html_font = 'helvetica',
font_size = 12,
full_width = F)
| Variable | Mean |
|---|---|
| bio1 | 21.52944 |
| bio4 | 210.89461 |
| bio5 | 30.66609 |
| bio6 | 10.88713 |
| bio12 | 769.73049 |
| bio13 | 147.28969 |
| bio14 | 11.33184 |
| bio15 | 76.49962 |
Tasks:
bio columns.pivot_longer to put the names of variables to column var, and values of variables to column value.group: if bio1, bio4, bio5, has value “Temperature”, if not, has value “Rainfall”. Hint: use ifelse.var and value, and make facets based on group.library(ggpubr)
library(tidyr)
occ_var %>%
dplyr::select(contains('bio')) %>%
pivot_longer(cols = contains('bio'),
names_to = 'var', values_to = 'value') %>%
mutate(group = ifelse(var %in% paste0('bio', 1:6),
'Temperature', 'Rainfall')) %>%
ggplot() +
facet_wrap(~ group, scale = 'free') +
labs(x = 'Environmental variable', y = 'Value') +
geom_boxplot(aes(x = var, y = value)) +
theme_cleveland()
Ecological niche
Tasks:
geom_hline.geom_vline.occ_var %>%
ggplot(aes(x = bio1, y = bio12)) +
geom_point(col = 'blue', size = 2) +
labs(x = 'Annual mean temperature',
y = 'Annual mean rainfall') +
geom_vline(xintercept = min(occ_var$bio1),
col = 'red', linetype = 2) +
geom_vline(xintercept = max(occ_var$bio1),
col = 'red', linetype = 2) +
geom_hline(yintercept = min(occ_var$bio12),
col = 'red', linetype = 2) +
geom_hline(yintercept = max(occ_var$bio12),
col = 'red', linetype = 2) +
theme_light()